KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS by aiquestion · Pull Request #12140 · apache/kafka

aiquestion · 2022-05-09T16:22:34Z

This PR is a missing part of #11451

Previous change want to solve https://issues.apache.org/jira/browse/KAFKA-13419, but in the final code didn't add code to reset generation id when SyncGroup received REBALANCE_IN_PROGRESS error.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

dajac · 2022-05-09T18:51:17Z

Thanks for the patch. Could we file a Jira for it please?

aiquestion · 2022-05-10T12:19:03Z

created a jira for it: https://issues.apache.org/jira/browse/KAFKA-13891

aiquestion · 2022-05-10T12:44:49Z

@showuon can you please help to review this?

showuon

@aiquestion , thanks for the PR. Could we add a test for it?

showuon · 2022-05-10T12:58:32Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java

                } else if (error == Errors.REBALANCE_IN_PROGRESS) {
                    log.info("SyncGroup failed: The group began another rebalance. Need to re-join the group. " +
                                 "Sent generation was {}", sentGeneration);
+                    resetStateOnResponseError(ApiKeys.SYNC_GROUP, error, false);


We might need to add a comment here to explain why we need to reset generation ID here.

added comment & unit test. thanks~

showuon

LGTM! Thanks for the update. Left a minor comment.

showuon · 2022-05-16T07:46:53Z

clients/src/test/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinatorTest.java

+                AbstractCoordinator.Generation currentGeneration = coordinator.generation();
+                return currentGeneration.generationId == AbstractCoordinator.Generation.NO_GENERATION.generationId &&
+                        currentGeneration.memberId.equals(memberId);
+            }, 2000, "Generation should be reset");


nit: I saw the 2000 timeout appeared in AbstractCoordinatorTest.java in many places. Could we use a static variable to replace them? Thanks.

can't think of any name for this 2000 timeout. so i just changed it to rebalance timeout. does that make sense? :-p

@aiquestion , the current REBALANCE_TIMEOUT_MS is 60 seconds, which means we'll wait 60 secs for generation reset. It's not correct. It should use 2 seconds as before. I think you can ignore my previous minor comment about 2000 change, and revert the REBALANCE_TIMEOUT_MS back to 2000. Thank you.

ah, okay. reverted. Thanks~

aiquestion · 2022-06-12T08:57:42Z

@aiquestion , thanks for the PR. Could we add a test for it?

@showuon sorry for the delay. What should i do to get this PR merged? ( it's my first time submit a PR )

showuon · 2022-06-13T02:39:15Z

Failed tests are unrelated

    Build / JDK 11 and Scala 2.13 / kafka.server.MultipleListenersWithDefaultJaasContextTest.testProduceConsume()
    Build / JDK 11 and Scala 2.13 / kafka.server.UpdateFeaturesTest.testShouldFailRequestDuringDeletionOfNonExistingFeature()

CONFLUENT: Sync from apache/kafka trunk to confluentinc/kafka master (13 Jun 2022) apache/trunk: (7 commits) KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN…(apache#12140) KAFKA-10000: Exactly-once source tasks (apache#11780) KAFKA-13436: Omitted BrokerTopicMetrics metrics in the documentation (apache#11473) MINOR: Use Exit.addShutdownHook instead of directly adding hooks to R…(apache#12283) KAFKA-13846: Adding overloaded metricOrElseCreate method (apache#12121) KAFKA-13935 Fix static usages of IBP in KRaft mode (apache#12250) HOTFIX: null check keys of ProducerRecord when computing sizeInBytes (apache#12288) Conflicts: None

…LANCE_IN_PROGRESS (apache#12140)" This reverts commit c23d60d.

ableegoldman · 2022-10-27T00:10:38Z

Hey @dajac @showuon just came across this from a user who's running into this on 3.0, given it was part of a series of fixes leading up to/included in 3.0, I think it can/should be backported to 3.2 - 3.0. Any concerns there?

Just lmk if there's any reason to be careful, or changes needed to backport the fix faithfully

…_PROGRESS (#12140) Reviewers: Luke Chen <showuon@gmail.com>

showuon · 2022-10-27T01:08:05Z

Agree to backported to 3.2 - 3.0. Thanks.

ableegoldman · 2022-10-27T05:32:39Z

Thanks @showuon ! Unfortunately I'm now seeing that the situation may be more complicated than I'd initially thought :/ Just came across this followup to the patch here: https://issues.apache.org/jira/browse/KAFKA-14016

Original reporter/PR author actually suggests reverting the changes here, and offers an alternative fix. I feel like I'm still catching up on the whole history here but while I wrap my head around could you give this new ticket a look? Wondering what your take on this is

Don't want to bias you with this but FWIW, when I was first pointed to this PR I was definitely skeptical of the changes, though I did ultimately convince myself it made sense. Now I'm letting my doubts creep back in lol

…LANCE_IN_PROGRESS (apache#12140)" This reverts commit c23d60d.

…LANCE_IN_PROGRESS (#12140)" (#12794) This reverts commit c23d60d. Reviewers: Luke Chen <showuon@gmail.com>

…LANCE_IN_PROGRESS (apache#12140)" (apache#12794) This reverts commit c23d60d. Reviewers: Luke Chen <showuon@gmail.com>

minor: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS

bc1d809

aiquestion marked this pull request as ready for review May 10, 2022 01:28

aiquestion changed the title ~~MINOR: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS~~ KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS May 10, 2022

showuon reviewed May 10, 2022

View reviewed changes

add test and add comment

8d425b0

showuon approved these changes May 16, 2022

View reviewed changes

aiquestion force-pushed the reset_generation_syncgroup_fail branch from 050c1ce to 8d425b0 Compare June 12, 2022 10:35

showuon merged commit c23d60d into apache:trunk Jun 13, 2022

aiquestion deleted the reset_generation_syncgroup_fail branch June 13, 2022 03:21

aiquestion added a commit to aiquestion/kafka that referenced this pull request Jun 25, 2022

Revert "KAFKA-13891: reset generation when syncgroup failed with REBA…

dce2d46

…LANCE_IN_PROGRESS (apache#12140)" This reverts commit c23d60d.

ableegoldman pushed a commit that referenced this pull request Oct 27, 2022

KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN…

7944d7b

…_PROGRESS (#12140) Reviewers: Luke Chen <showuon@gmail.com>

aiquestion added a commit to aiquestion/kafka that referenced this pull request Oct 28, 2022

Revert "KAFKA-13891: reset generation when syncgroup failed with REBA…

b9a1c8d

…LANCE_IN_PROGRESS (apache#12140)" This reverts commit c23d60d.

aiquestion mentioned this pull request Oct 28, 2022

Revert "KAFKA-13891: reset generation when syncgroup failed with REBA… #12794

Merged

3 tasks

showuon pushed a commit that referenced this pull request Nov 5, 2022

Revert "KAFKA-13891: reset generation when syncgroup failed with REBA…

fcab5fb

…LANCE_IN_PROGRESS (#12140)" (#12794) This reverts commit c23d60d. Reviewers: Luke Chen <showuon@gmail.com>

showuon pushed a commit that referenced this pull request Nov 5, 2022

Revert "KAFKA-13891: reset generation when syncgroup failed with REBA…

846f404

…LANCE_IN_PROGRESS (#12140)" (#12794) This reverts commit c23d60d. Reviewers: Luke Chen <showuon@gmail.com>

showuon pushed a commit that referenced this pull request Nov 5, 2022

Revert "KAFKA-13891: reset generation when syncgroup failed with REBA…

4f659db

…LANCE_IN_PROGRESS (#12140)" (#12794) This reverts commit c23d60d. Reviewers: Luke Chen <showuon@gmail.com>

Conversation

aiquestion commented May 9, 2022

Committer Checklist (excluded from commit message)

Uh oh!

dajac commented May 9, 2022

Uh oh!

aiquestion commented May 10, 2022

Uh oh!

aiquestion commented May 10, 2022

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

showuon May 10, 2022

Choose a reason for hiding this comment

Uh oh!

aiquestion May 15, 2022

Choose a reason for hiding this comment

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

showuon May 16, 2022

Choose a reason for hiding this comment

Uh oh!

aiquestion Jun 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon Jun 12, 2022

Choose a reason for hiding this comment

Uh oh!

aiquestion Jun 12, 2022

Choose a reason for hiding this comment

Uh oh!

aiquestion commented Jun 12, 2022

Uh oh!

showuon commented Jun 13, 2022

Uh oh!

ableegoldman commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

showuon commented Oct 27, 2022

Uh oh!

ableegoldman commented Oct 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aiquestion Jun 12, 2022 •

edited

Loading

ableegoldman commented Oct 27, 2022 •

edited

Loading